Esta página contiene el código para generar análisis de redes personales (ego networks) en Twitter.

Set up

library(rtweet)
source("createTokens.R")  ## keys y tokens privados
source("rtweet_functions.R") ## funciones para trabajar con múltiples tokens

library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggwordcloud)
library(tidytext)
theme_set(theme_custom())

El primer paso consiste en escoger un usuario focal (o “ego”) a partir del cual construímos una red personal.

ego <- "acastroaraujo" # Yo
ego_info <- lookup_users(ego, token = sample(token, 1))

ego_info$followers_count
## [1] 780

Nombre: andrés castro araújo

Usuario: acastroaraujo

Seguidores: 780

Amigos: 864

Se unió a Twitter en 2010-05-10 04:54:27

Este análisis está dividido en tres partes.

  1. La red de seguidores del usuario focal
  2. La red de “amigos” del ususario focal
  3. La red de amigos-seguidores del usuario focal

Cada una de estas tres dimensiones corresponde a flujos de interacción diferentes. La primera consiste de los usuarios que reciben información de acastroaraujo, la segunda son los usuarios que generan la información recibida por acastroaraujo, y la tercera consiste en los usuarios donde el flujo de información es recíproco.

Este código es de acceso libre excepto por los keys y tokens privados que se consiguen abriendo una cuenta de desarrollador en https://developer.twitter.com/

Red de seguidores

El siguiente código extrae la lista de seguidores de acastroaraujo (cada uno identificado con un user_id).

ego_followers <- get_followers(ego, token = sample(token, 1))
ego_followers
## # A tibble: 780 x 1
##    user_id            
##    <chr>              
##  1 363863237          
##  2 892820477747572738 
##  3 26968974           
##  4 1015741492411891713
##  5 615478352          
##  6 116055146          
##  7 492186603          
##  8 957754266520903686 
##  9 1921562414         
## 10 952439346          
## # … with 770 more rows

Este user_id es exclusivo a cada cuenta, incluso cuando el usuario decide cambiar su nombre.

El siguiente código crea una carpeta llamada *_friends_of_followers/ donde queda archivado la lista de los seguidores de cada uno de estos usuarios.

Dependiendo del número de usuarios y el número de Tokens, esto puede llegar a demorarse varias horas (o incluso días).

outfolder <- paste0(ego, "_friends_of_followers/")
if (!dir.exists(outfolder)) dir.create(outfolder)
users_done <- str_replace(dir(outfolder), ".rds", "")
users_left <- setdiff(ego_followers$user_id, users_done)

while (length(users_left) > 0) { 
  
  new_user <- users_left[[1]]
  
  friends_of_user <- try(multi_get_friends(new_user, token))
  
  file_name <- str_glue("{outfolder}{new_user}.rds")
  write_rds(friends_of_user, file_name, compress = "gz")
  users_left <- users_left[-which(users_left %in% new_user)] ## int. subset
  
}

Para algunos usuarios esta información es imposible de conseguir porque son cuentas protegidas.

En este caso, no se puede obtener información sobre el 6.4% de los sequidores de acastroaraujo.

Edge list

Para construir la red, tomamos toda la lista de usuarios y sus amigos y los organizamos en dos columnas, donde cada fila indica un usario (from) siguiendo a otro usario (to).

edge_list <- list.files(outfolder, full.names = TRUE) %>% 
  map(read_rds)
  
edge_list <- edge_list[-error_index] %>% 
  bind_rows() 

edge_list
## # A tibble: 1,391,926 x 2
##    from                to        
##    <chr>               <chr>     
##  1 1001194679977893888 106228188 
##  2 1001194679977893888 303862998 
##  3 1001194679977893888 53279593  
##  4 1001194679977893888 89109653  
##  5 1001194679977893888 47514423  
##  6 1001194679977893888 91831163  
##  7 1001194679977893888 150638911 
##  8 1001194679977893888 405729991 
##  9 1001194679977893888 350926847 
## 10 1001194679977893888 4853185695
## # … with 1,391,916 more rows

Aquí hay 1,391,926 conexiones. Sin embargo, aquí están incluídos conexiones on usuarios más allá de los que siguen a acastroaraujo.

ego_followers_info <- lookup_users(ego_followers$user_id, token = sample(token), 1)
write_rds(ego_followers_info, paste0(ego, "_follower_info.rds"), compress = "gz")

También podemos conseguir metadatos sobre cada usuario.

ego_followers_info <- read_rds(paste0(ego, "_follower_info.rds")) %>% 
  filter(!protected) %>% 
  select(
    user_id, screen_name, lang, name, location, description,
    ends_with("count"), -starts_with("quote"), 
    -starts_with("retweet"), -reply_count,
    -starts_with("fav")
    ) %>% 
    rename(name = user_id, user_name = name)

id_dict <- ego_followers_info %>% 
  select(name, screen_name) %>% 
  deframe()

Por ejemplo, esta es la información que corresponde a los seguidores de acastroaraujo con mayor número de seguidores.

ego_followers_info %>% 
  arrange(desc(followers_count)) %>% 
  select(screen_name, description, location, followers_count, friends_count)
## # A tibble: 730 x 5
##    screen_name  description             location   followers_count friends_count
##    <chr>        <chr>                   <chr>                <int>         <int>
##  1 RodrigoUpri… "Investigador @Dejusti… "Colombia"          143076           558
##  2 Rivas_Santi… "Director/presentador … ""                  132162          9233
##  3 FundarMexico "Organización plural, … "Ciudad d…          115607         15113
##  4 CVderoux     "Ex concejal de Bogotá… "Bogotá D…          114916         15145
##  5 JuanitaGoe   "Representante a la Cá… "Bogotá, …          111332          7405
##  6 Bejumero     "Alcalde de Bejuma 201… "Carabobo…           95466         72899
##  7 Popeye_leye… ""                      "Colombia"           80287          4220
##  8 Dejusticia   "Centro de Estudios de… "Bogotá, …           67441          2145
##  9 JoseOMorera  "#Profesor #Autor #Con… "Bogotá, …           60723         58979
## 10 Pacifistacol "Una plataforma para l… "Colombia"           50959          2208
## # … with 720 more rows

Finalmente nos interesa la red personal de seguidores de acastroaraujo, por lo cual eliminamos las conexiones entre usuarios que se encuentran por fuera de sus 780

edge_list <- edge_list %>% 
  filter(to %in% ego_followers_info$name) %>% 
  filter(from %in% ego_followers_info$name)

edge_list
## # A tibble: 20,977 x 2
##    from                to        
##    <chr>               <chr>     
##  1 1001194679977893888 142469128 
##  2 1001194679977893888 36087400  
##  3 1001194679977893888 14063051  
##  4 1001194679977893888 48253393  
##  5 1001194679977893888 382592033 
##  6 1001194679977893888 410228042 
##  7 1001194679977893888 12542002  
##  8 1004030798125821953 76678975  
##  9 1004030798125821953 162139278 
## 10 1004030798125821953 1894875410
## # … with 20,967 more rows

La red personal de seguidores de acastroaraujo que pudimos reconstruir tiene 730 usuarios con 20977 conexiones.

Red Personal

ego_network <- edge_list %>% 
  tidygraph::as_tbl_graph() %>% 
  left_join(ego_followers_info) %>% 
  rename(name = screen_name, user_id = name) %>% 
  select(name, everything())

ego_network
## # A tbl_graph: 703 nodes and 20977 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 703 x 10 (active)
##   name  user_id lang  user_name location description followers_count
##   <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
## 1 Buho… 100119… es    Búho Sol… ""       ""                       42
## 2 Nava… 100403… es    Esteban … "Bogotá… "Gender & …             583
## 3 Mara… 100415… <NA>  Mararía … ""       ""                       14
## 4 Davi… 100418… <NA>  David     ""       ""                        1
## 5 jarj… 100784… en    Alexande… "Bogota… "Economist…              84
## 6 JFer… 100992… en    J. Ferna… "Extrem… "Hyperacti…             396
## # … with 697 more rows, and 3 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>
## #
## # Edge Data: 20,977 x 2
##    from    to
##   <int> <int>
## 1     1   158
## 2     1   413
## 3     1   152
## # … with 20,974 more rows
## Estadísticas descriptivas

ego_network <- ego_network %>% 
  mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  )

ego_network 
## # A tbl_graph: 703 nodes and 20977 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 703 x 15 (active)
##   name  user_id lang  user_name location description followers_count
##   <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
## 1 Buho… 100119… es    Búho Sol… ""       ""                       42
## 2 Nava… 100403… es    Esteban … "Bogotá… "Gender & …             583
## 3 Mara… 100415… <NA>  Mararía … ""       ""                       14
## 4 Davi… 100418… <NA>  David     ""       ""                        1
## 5 jarj… 100784… en    Alexande… "Bogota… "Economist…              84
## 6 JFer… 100992… en    J. Ferna… "Extrem… "Hyperacti…             396
## # … with 697 more rows, and 8 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>, out_degree <dbl>,
## #   in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## #   eigen_centrality <dbl>
## #
## # Edge Data: 20,977 x 2
##    from    to
##   <int> <int>
## 1     1   158
## 2     1   413
## 3     1   152
## # … with 20,974 more rows

La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de seguidores (eje vertical)

ego_network %>% 
  as_tibble() %>% 
  #filter(in_degree > 5) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() 

ego_network %>% 
  as_tibble() %>% 
  mutate(label_name = ifelse(
    test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() + 
  ggrepel::geom_label_repel(aes(label = label_name), size = 3)

Clusters

set.seed(123)
clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 7)

cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names) 
  
cluster_df <- cluster_df %>% 
  group_by(cluster) %>% 
  filter(n() >= 10) %>% 
  ungroup()
ego_network <- ego_network %>% 
  left_join(cluster_df)
## Joining, by = "name"
ego_network %>% 
  as_tibble() %>% 
  arrange(desc(in_degree)) %>% 
  filter(!is.na(cluster)) %>% 
  group_by(cluster) %>%
  filter(rank(-authority_score) <= 30) %>% 
  ggplot(aes(label = name, size = log(in_degree), color = in_degree)) + 
  geom_text_wordcloud_area(family = "Avenir Next Condensed") + 
  facet_wrap(~cluster) + 
  labs(title = "Seguidores prominentes en cada cluster") + 
  scale_color_gradient(low = "grey", high = "purple") 

Tamaño de cada cluster:

ego_network %>% as_tibble() %>% count(cluster)
## # A tibble: 6 x 2
##   cluster     n
##   <fct>   <int>
## 1 2          13
## 2 4         470
## 3 6          15
## 4 8          10
## 5 12        150
## 6 <NA>       45

¿Quiénes son los usuarios que funcionan como “puentes”?

ego_network %>% 
  as_tibble() %>% 
  arrange(desc(betweenness)) %>% 
  select(name, description, location)
## # A tibble: 703 x 3
##    name        description                                      location        
##    <chr>       <chr>                                            <chr>           
##  1 malbarracin Santandereano en el exilio. Abogado, activista … "Bogotá"        
##  2 RAKarl      Colombia past + present; author of #ForgottenPe… "Tierra Fría"   
##  3 JuanitaGoe  Representante a la Cámara por Bogotá (2018-2022… "Bogotá, D.C., …
##  4 psanabria   Public Policy & Management Scholar | Profesor e… "Latin America" 
##  5 Rivas_Sant… Director/presentador de Puntos Capitales. Parte… ""              
##  6 MariaPradaU Abogada, Magíster en Antropología | Climate Pro… "Bogotá, Colomb…
##  7 Dejusticia  Centro de Estudios de Derecho, Justicia y Socie… "Bogotá, Colomb…
##  8 MajoAlRiv   Profesora | Socióloga | Desigualdad | Ciudades … ""              
##  9 EmilioLeho… Ph.D candidate @NUSociology. STS, social moveme… ""              
## 10 FulanoZulu… En defensa del pequeño ahorrador e inversionist… ""              
## # … with 693 more rows
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")

ego_network %>% 
  as_tibble() %>% 
  group_by(cluster) %>% 
  summarize(across(all_of(cols), mean)) %>% 
  arrange(desc(betweenness))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 6 x 6
##   cluster betweenness in_degree out_degree followers_count friends_count
##   <fct>         <dbl>     <dbl>      <dbl>           <dbl>         <dbl>
## 1 2             2040.      7.92       7.85           3606.         3358.
## 2 6             1235.     10.5       10.6             265.          490.
## 3 4             1102.     35.9       35.4            4065.         2214.
## 4 12             666.     24.0       25.7             460.          745.
## 5 8              616.      4.5        5.2             905.         1434.
## 6 <NA>           609.      3.98       3.8            3934.         2815.

Subset

Dada la información anterior podemos enfocarnos en segmentos particulares de la red personal.

Por ejemplo, podemos enfocarnos exclusivamente en los usuarios que hacen parte de los grupos etiquetados con 12 y 4.

ego_network_subset <- ego_network %>% 
  filter(cluster %in% c(12, 4, 2, 6)) %>% 
    mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  ) 

ego_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = in_degree), 
                  shape = 21, color = "white", show.legend = FALSE) 

ego_network_subset %>% 
  as_tibble() %>% 
  mutate(label_id = ifelse(
    test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(betweenness, in_degree, color = cluster)) +
  geom_point() +
  ggrepel::geom_label_repel(aes(label = label_id), size = 3)

ego_network_subset %>% 
  group_by(cluster) %>% 
  mutate(label_name = ifelse(
    test = rank(-authority_score) <= 5 | rank(-betweenness) <= 5,
    yes = name,
    no = NA_character_
  )) %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = betweenness), 
                  shape = 21, color = "white", show.legend = FALSE) +
  geom_node_label(aes(label = label_name), 
                  repel = TRUE, alpha = 3/4, size = 3) 
## Ungrouping graph...

Red de amigos

Esta sección repite el análisis anterior para la red personal de amigos de acastroaraujo

outfolder <- paste0(ego, "_friends_of_friends/")
if (!dir.exists(outfolder)) dir.create(outfolder)

ego_friends <- get_friends(ego, token = sample(token, 1))
ego_friends
## # A tibble: 864 x 2
##    user          user_id            
##    <chr>         <chr>              
##  1 acastroaraujo 760639303          
##  2 acastroaraujo 377284121          
##  3 acastroaraujo 1159264860233961473
##  4 acastroaraujo 1040310675497730050
##  5 acastroaraujo 784170814342127616 
##  6 acastroaraujo 175805255          
##  7 acastroaraujo 2363460824         
##  8 acastroaraujo 285698560          
##  9 acastroaraujo 605012771          
## 10 acastroaraujo 1093788314         
## # … with 854 more rows
users_done <- str_replace(dir(outfolder), ".rds", "")
users_left <- setdiff(ego_friends$user_id, users_done)

while (length(users_left) > 0) { 
  
  new_user <- users_left[[1]]
  
  friends_of_user <- try(multi_get_friends(new_user, token))
  
  file_name <- str_glue("{outfolder}{new_user}.rds")
  write_rds(friends_of_user, file_name, compress = "gz")
  users_left <- users_left[-which(users_left %in% new_user)] ## int. subset
  
}

En este caso, no se puede obtener información sobre el 2.4% de los amigos de acastroaraujo.

Edge list

edge_list <- list.files(outfolder, full.names = TRUE) %>% 
  map(read_rds)
  
edge_list <- edge_list[-error_index] %>% bind_rows()

edge_list
## # A tibble: 1,234,882 x 2
##    from                to                 
##    <chr>               <chr>              
##  1 1001511262545592320 1128081883923918848
##  2 1001511262545592320 717312300529618945 
##  3 1001511262545592320 1189495990119817217
##  4 1001511262545592320 1226598133205012480
##  5 1001511262545592320 1025538004167864321
##  6 1001511262545592320 39619991           
##  7 1001511262545592320 300049226          
##  8 1001511262545592320 1145686112         
##  9 1001511262545592320 1653264782         
## 10 1001511262545592320 983470194982088704 
## # … with 1,234,872 more rows
ego_friends_info <- lookup_users(ego_friends$user_id, token = token)
write_rds(ego_friends_info, paste0(ego, "_friends_info.rds"), compress = "gz")
ego_friends_info <- read_rds(paste0(ego, "_friends_info.rds")) %>% 
  filter(!protected) %>% 
  select(
    user_id, screen_name, lang, name, location, description,
    ends_with("count"), -starts_with("quote"), 
    -starts_with("retweet"), -reply_count,
    -starts_with("fav")
    ) %>% 
    rename(name = user_id, user_name = name)

id_dict <- ego_friends_info %>% 
  select(name, screen_name) %>% 
  deframe()

Esta es la información que corresponde a los amigos de acastroaraujo con mayor número de seguidores.

ego_friends_info %>% 
  arrange(desc(followers_count)) %>% 
  select(screen_name, description, location, followers_count, friends_count)
## # A tibble: 845 x 5
##    screen_name  description             location   followers_count friends_count
##    <chr>        <chr>                   <chr>                <int>         <int>
##  1 AOC          "US Representative,NY-… "Bronx + …        10430327          2902
##  2 NewYorker    "Unparalleled reportin… "New York…         8969567           373
##  3 hcapriles    "#YoSoyVenezolano"      "Venezuel…         7180260          1915
##  4 jack         "#bitcoin"              ""                 4996684          4537
##  5 paulkrugman  "Nobel laureate. Op-Ed… "New York…         4646688            67
##  6 DAVID_LYNCH  "Filmmaker. Born Misso… "Los Ange…         3337686            42
##  7 ClaudiaLopez "Primera Alcaldesa de … "Bogotá, …         2472410          2506
##  8 sarahcpr     "Watch my comedy speci… "New York…         2430096          3130
##  9 fdbedout     "Periodista, presentad… "Miami, F…         2277988           679
## 10 lasillavacia "La cuenta de Twitter … "Bogotá"           1278648          3248
## # … with 835 more rows
edge_list <- edge_list %>% 
  filter(to %in% ego_friends_info$name) %>% 
  filter(from %in% ego_friends_info$name)

edge_list
## # A tibble: 47,526 x 2
##    from                to                
##    <chr>               <chr>             
##  1 1001511262545592320 607752311         
##  2 1001511262545592320 69133574          
##  3 1001511262545592320 2167059661        
##  4 1001511262545592320 14247789          
##  5 1001511262545592320 742379544309567489
##  6 1001511262545592320 381642287         
##  7 1001511262545592320 2158970839        
##  8 1001511262545592320 13074042          
##  9 1001511262545592320 16284661          
## 10 1004030798125821953 323599188         
## # … with 47,516 more rows

La red personal de seguidores de acastroaraujo que pudimos reconstruir tiene 845 usuarios con 47526 conexiones.

Red Personal

ego_network <- edge_list %>% 
  tidygraph::as_tbl_graph() %>% 
  left_join(ego_friends_info) %>% 
  rename(name = screen_name, user_id = name) %>% 
  select(name, everything())

## Estadísticas descriptivas

ego_network <- ego_network %>% 
  mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  )

ego_network
## # A tbl_graph: 842 nodes and 47526 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 842 x 15 (active)
##   name  user_id lang  user_name location description followers_count
##   <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
## 1 WeAr… 100151… en    We are R… "The wh… "RoCur (Ro…           22487
## 2 Nava… 100403… es    Esteban … "Bogotá… "Gender & …             583
## 3 jarj… 100784… en    Alexande… "Bogota… "Economist…              84
## 4 JFer… 100992… es    J. Ferna… "Extrem… "Hyperacti…             396
## 5 chri… 101015… en    Christin… ""       "Sociologi…            2473
## 6 Cual… 101493… en    Cualquie… ""       "Founder a…             258
## # … with 836 more rows, and 8 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>, out_degree <dbl>,
## #   in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## #   eigen_centrality <dbl>
## #
## # Edge Data: 47,526 x 2
##    from    to
##   <int> <int>
## 1     1   634
## 2     1   660
## 3     1   306
## # … with 47,523 more rows

La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de amigos (eje vertical)

ego_network %>% 
  as_tibble() %>% 
  #filter(in_degree > 5) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() 

ego_network %>% 
  as_tibble() %>% 
  mutate(label_name = ifelse(
    test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(followers_count, in_degree)) + 
  geom_point() + 
  ggrepel::geom_label_repel(aes(label = label_name), size = 3)

Clusters

clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 12)

cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names) 
  
cluster_df <- cluster_df %>% 
  group_by(cluster) %>% 
  filter(n() >= 10) %>% 
  ungroup()
ego_network <- ego_network %>% 
  left_join(cluster_df)
## Joining, by = "name"
ego_network %>% 
  as_tibble() %>% 
  arrange(desc(in_degree)) %>% 
  filter(!is.na(cluster)) %>% 
  group_by(cluster) %>%
  filter(rank(-authority_score) <= 50) %>% 
  ggplot(aes(label = name, size = log(in_degree), color = in_degree)) + 
  geom_text_wordcloud_area(family = "Avenir Next Condensed") + 
  facet_wrap(~cluster) + 
  labs(title = "Seguidores prominentes en cada cluster") + 
  scale_color_gradient(low = "grey", high = "purple") 

Tamaño de cada cluster:

ego_network %>% as_tibble() %>% count(cluster)
## # A tibble: 2 x 2
##   cluster     n
##   <fct>   <int>
## 1 1         368
## 2 2         474

¿Quiénes son los usuarios que funcionan como “puentes”?

ego_network %>% 
  as_tibble() %>% 
  arrange(desc(betweenness)) %>% 
  select(name, description, location)
## # A tibble: 842 x 3
##    name        description                                       location       
##    <chr>       <chr>                                             <chr>          
##  1 RAKarl      "Colombia past + present; author of #ForgottenPe… "Tierra Fría"  
##  2 malbarracin "Santandereano en el exilio. Abogado, activista … "Bogotá"       
##  3 infrahumano ""                                                "Toronto"      
##  4 SergioChap… "Conspiring for a #RightsBasedEconomy at @social… "Brooklyn, NY" 
##  5 cblatts     "@UChicago political economist studying violence… "Chicago, IL"  
##  6 causalinf   "Economist slacker. Can’t remember if he ever pu… "Waco, Texas"  
##  7 MajoAlRiv   "Profesora | Socióloga | Desigualdad | Ciudades … ""             
##  8 AOC         "US Representative,NY-14 (BX & Queens). In a mod… "Bronx + Queen…
##  9 Undercover… "Historian of postwar economics @CNRS/CREST @Xde… "where Delorea…
## 10 alondra     "President @SSRC_org. Harold F. Linder Professor… "Gotham and Pr…
## # … with 832 more rows
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")

ego_network %>% 
  as_tibble() %>% 
  group_by(cluster) %>% 
  summarize(across(all_of(cols), mean)) %>% 
  arrange(desc(betweenness))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 6
##   cluster betweenness in_degree out_degree followers_count friends_count
##   <fct>         <dbl>     <dbl>      <dbl>           <dbl>         <dbl>
## 1 2             1304.      72.4       76.5          94460.         1593.
## 2 1             1295.      35.9       30.6          95043.         1309.

Subset

ego_network_subset <- ego_network %>% 
  filter(!is.na(cluster)) %>% 
    mutate(
    out_degree = centrality_degree(mode = "out"),
    in_degree = centrality_degree(mode = "in"),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  ) 

ego_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = in_degree), 
                  shape = 21, color = "white", show.legend = FALSE) 

ego_network_subset %>% 
  as_tibble() %>% 
  mutate(label_id = ifelse(
    test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(betweenness, in_degree, color = cluster)) +
  geom_point() +
  ggrepel::geom_label_repel(aes(label = label_id), size = 3)

ego_network_subset %>% 
  group_by(cluster) %>% 
  mutate(label_name = ifelse(
    test = rank(-authority_score) <= 5 | rank(-betweenness) <= 5,
    yes = name,
    no = NA_character_
  )) %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = betweenness), 
                  shape = 21, color = "white", show.legend = FALSE) +
  geom_node_label(aes(label = label_name), 
                  repel = TRUE, alpha = 3/4, size = 3) 
## Ungrouping graph...

Red de mutuals

Red Personal

edge_list <- list.files(paste0(ego, "_friends_of_friends/"), full.names = TRUE) %>% 
  map(read_rds)

error_index <- edge_list %>% 
  map_lgl(~ any(class(.x) == "try-error")) %>% 
  which()

edge_list <- edge_list[-error_index] %>% bind_rows()

edge_list_mutual <- inner_join(
  edge_list, 
  edge_list %>% rename(from = to, to = from)
  ) %>% 
  filter(from %in% ego_followers$user_id, to %in% ego_followers$user_id) %>%
  filter(from %in% ego_friends$user_id, to %in% ego_friends$user_id) %>%
  filter(from %in% to, to %in% from)
## Joining, by = c("from", "to")
mat <- edge_list_mutual %>% 
  mutate(n = 1) %>% 
  tidytext::cast_sparse(from, to, n) %>% 
  as.matrix()

mat <- mat[colnames(mat), ]

mutual_network <- mat %>% 
  graph_from_adjacency_matrix(mode = "undirected") %>% 
  tidygraph::as_tbl_graph() 

mutual_network
## # A tbl_graph: 293 nodes and 3267 edges
## #
## # An undirected simple graph with 2 components
## #
## # Node Data: 293 x 1 (active)
##   name              
##   <chr>             
## 1 148507300         
## 2 362101597         
## 3 880165015780827136
## 4 813222199         
## 5 48253393          
## 6 77232869          
## # … with 287 more rows
## #
## # Edge Data: 3,267 x 2
##    from    to
##   <int> <int>
## 1     1     2
## 2     1     4
## 3     1     5
## # … with 3,264 more rows
ego_mutuals_info <- lookup_users(as_tibble(mutual_network)$name, token = sample(token), 1)

ego_mutuals_info <- ego_mutuals_info %>% 
  filter(!protected) %>% 
  select(
    user_id, screen_name, lang, name, location, description,
    ends_with("count"), -starts_with("quote"), 
    -starts_with("retweet"), -reply_count,
    -starts_with("fav")
    ) %>% 
    rename(name = user_id, user_name = name)

mutual_network <- mutual_network %>% 
  inner_join(ego_mutuals_info) %>% 
  rename(name = screen_name, user_id = name) %>% 
  select(name, everything())
## Joining, by = "name"
## Estadísticas descriptivas

mutual_network <- mutual_network %>% 
  mutate(
    degree = centrality_degree(),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  )

La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de amigos (eje vertical)

mutual_network %>% 
  as_tibble() %>% 
  ggplot(aes(followers_count, degree)) + 
  geom_point() 

mutual_network %>% 
  as_tibble() %>% 
  mutate(label_name = ifelse(
    test = rank(-followers_count) <= 15 | rank(-degree) <= 15, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(followers_count, degree)) + 
  geom_point() + 
  ggrepel::geom_label_repel(aes(label = label_name), size = 3)

Clusters

clusters <- igraph::cluster_louvain(graph = mutual_network)

cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names) 
  
cluster_df <- cluster_df %>% 
  group_by(cluster) %>% 
  filter(n() >= 10) %>% 
  ungroup()
mutual_network <- mutual_network %>% 
  left_join(cluster_df)
## Joining, by = "name"
mutual_network %>% 
  as_tibble() %>% 
  arrange(desc(degree)) %>%
  filter(!is.na(cluster)) %>% 
  group_by(cluster) %>%
  filter(rank(-authority_score) <= 40) %>% 
  ggplot(aes(label = name, size = log(degree), color = degree)) + 
  geom_text_wordcloud_area(family = "Avenir Next Condensed") + 
  facet_wrap(~cluster) + 
  labs(title = "Seguidores prominentes en cada cluster") + 
  scale_color_gradient(low = "grey", high = "purple") 

Tamaño de cada cluster:

mutual_network %>% as_tibble() %>% count(cluster)
## # A tibble: 5 x 2
##   cluster     n
##   <fct>   <int>
## 1 1          66
## 2 3         113
## 3 4          46
## 4 5          63
## 5 <NA>        5

¿Quiénes son los usuarios que funcionan como “puentes”?

mutual_network %>% 
  as_tibble() %>% 
  arrange(desc(betweenness)) 
## # A tibble: 293 x 15
##    name  user_id lang  user_name location description followers_count
##    <chr> <chr>   <chr> <chr>     <chr>    <chr>                 <int>
##  1 RAKa… 572136… en    Robert A… "Tierra… Colombia p…            9447
##  2 malb… 482533… es    Mauricio… "Bogotá" Santandere…           39663
##  3 Emil… 464906… en    Emilio L… ""       Ph.D candi…            2157
##  4 emil… 227530… en    Emily Ma… "Athens… UGA '18; D…             452
##  5 Marg… 491309… es    Margarit… "Colomb… Estudiante…             757
##  6 gene… 141650… en    (wannabe… "Palo A… Lapsed com…           18632
##  7 Majo… 878152… es    María Jo… ""       Profesora …            3344
##  8 Maca… 249998… en    María-Cl… "Stony … Ph.D. Coca…            3780
##  9 Serg… 476898… es    Sergio C… "Brookl… Conspiring…            2894
## 10 Mari… 766789… und   Maria An… "Bogotá… Abogada, M…            4882
## # … with 283 more rows, and 8 more variables: friends_count <int>,
## #   listed_count <int>, statuses_count <int>, degree <dbl>, betweenness <dbl>,
## #   authority_score <dbl>, eigen_centrality <dbl>, cluster <fct>
cols <- c("betweenness", "degree", "followers_count", "friends_count")

mutual_network %>% 
  as_tibble() %>% 
  group_by(cluster) %>% 
  summarize(across(all_of(cols), mean)) %>% 
  arrange(desc(betweenness))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 5 x 5
##   cluster betweenness degree followers_count friends_count
##   <fct>         <dbl>  <dbl>           <dbl>         <dbl>
## 1 4              314.   6.93           3167.         1709.
## 2 3              298.  26.0            6180.         2401.
## 3 5              178.  31.1            6692.         1409.
## 4 1              127.  19.8             681.          695.
## 5 <NA>           116.   1.8             335           732

Subset

mutual_network_subset <- mutual_network %>% 
  filter(!is.na(cluster)) %>% 
    mutate(
    degree = centrality_degree(),
    betweenness = centrality_betweenness(directed = TRUE),
    authority_score = centrality_authority(),
    eigen_centrality = centrality_eigen(directed = TRUE)
  ) 

mutual_network_subset %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = degree), 
                  shape = 21, color = "white", show.legend = FALSE) 

mutual_network_subset %>% 
  as_tibble() %>% 
  mutate(label_id = ifelse(
    test = rank(-betweenness) <= 10 |rank(-degree) <= 10, 
    yes = name, 
    no = NA_character_)
    ) %>% 
  ggplot(aes(betweenness, degree, color = cluster)) +
  geom_point() +
  ggrepel::geom_label_repel(aes(label = label_id), size = 3)

mutual_network_subset %>% 
  group_by(cluster) %>% 
  mutate(label_name = ifelse(
    test = rank(-degree) <= 5 | rank(-betweenness) <= 5,
    yes = name,
    no = NA_character_
  )) %>% 
  ggraph("mds") +
  geom_edge_fan(alpha = 1/5, width = 1/5) + 
  geom_node_point(aes(fill = cluster, size = betweenness), 
                  shape = 21, color = "white", show.legend = FALSE) +
  geom_node_label(aes(label = label_name), 
                  repel = TRUE, alpha = 3/4, size = 3) 
## Ungrouping graph...

Funciones adicionales

readLines("rtweet_functions.R") %>% 
  writeLines()
## 
## # main functions ----------------------------------------------------------
## 
## multi_get_friends <- function(u, token_list) {
##   
##   user_info <- lookup_users(u, token = sample(token_list, 1)[[1]])
##   fc <- user_info$friends_count
##   message("<<", user_info$screen_name, ">> is following ", scales::comma(fc), " users ")
##   
##   if (user_info$protected) stop(call. = FALSE, "The account is protected, we can't get followers.")
##   
##   num_queries <- ceiling(fc / 5000)
##   rl <- rate_limit(token_list, "get_friends")
##   rl <- validate_rate_limit(rl, "get_friends", token_list)
##   
##   index <- get_available_token_index(rl)
##   
##   # Case 0: User doesn't have any friends
##   
##   if (fc == 0) return(tibble(from = character(0), to = character(0))) 
##   
##   # Case 1: Less than 5,000 friends, only call is needed
##   
##   if (fc <= 5e3) {
##     
##     friends <- get_friends(u, token = token_list[[index]])
##     
##   } else {
##     
##     # Case 2: Many calls are needed
##     
##     output <- vector("list", length = num_queries)
##     output[[1]] <- get_friends(u, token = token_list[[index]])
##     
##     for (i in 2:length(output)) {
##       
##       rl <- validate_rate_limit(rl, "get_friends", token_list)
##       index <- get_available_token_index(rl)
##       output[[i]] <- get_friends(u, token = token_list[[index]], page = next_cursor(output[[i - 1]]))
##       
##     }
##     
##     friends <- bind_rows(output) %>% 
##       distinct()
##     
##   }
##   
##   attr(friends, "next_cursor") <- NULL
##   
##   friends %>% 
##     rename(from = user, to = user_id) %>% 
##     mutate(from = user_info$user_id)
##   
## }
## 
## multi_get_timeline <- function(u, n, token_list, home = FALSE) {
##   
##   message(u)
##   rl <- rate_limit(token_list, "get_timeline")
##   rl <- validate_rate_limit(rl, "get_timeline", token_list)
##   
##   index <- get_available_token_index(rl)
##   
##   # Case 0: User doesn't have any posts
##   
##   # what to do?
##   
##   # Should we allow to get all the timeline??? If so, mimic previous function
##     
##   tl <- get_timeline(u, n = n, home = home, token = token_list[[index]])
## 
##   return(tl)
##   
## }
## 
## # multi_lookup_users <- function() {
## #   
## #   
## # }
## 
## 
## # helpers -----------------------------------------------------------------
## 
## validate_rate_limit <- function(rl, q, token_list) {
##   
##   if (is_empty(rl)) {
##     message("Waiting for rate limiting update")
##     Sys.sleep(60)
##     rl <- rate_limit(token_list, query = q)
##     validate_rate_limit(rl, q, token_list) # recursion!
##     
##   }
##   
##   if (all(rl$remaining == 0)) {
##     
##     message("Waiting for token reset in ", round(min(rl$reset), 1), " minutes")
##     Sys.sleep(min(as.numeric(rl$reset_at - Sys.time(), units = "secs")) + 5)
##     rl <- rate_limit(token_list, query = q)
##     validate_rate_limit(rl, q, token_list) # recursion!
##     
##   }
##   
##   rl
##   
## }
## 
## get_available_token_index <- function(rl) {
##   
##   env <- rlang::caller_env()
##   available_token <- rl$remaining > 0
##   index <- which(available_token)[[1]]
##   env$rl[index, ]$remaining <- rl[index, ]$remaining - 1  # this modifies the rl obj in the parent frame
##   return(index)
##   
## }
theme_custom
## function (base_family = "Avenir Next Condensed", fill = "white", ...) {
##     theme_minimal(base_family = base_family, ...) %+replace% 
##         theme(plot.title = element_text(face = "bold", margin = margin(0, 
##             0, 5, 0), hjust = 0, size = 13), plot.subtitle = element_text(face = "italic", 
##             margin = margin(0, 0, 5, 0), hjust = 0), plot.background = element_rect(fill = fill, 
##             size = 0), complete = TRUE, axis.title.x = element_text(margin = margin(15, 
##             0, 0, 0)), axis.title.y = element_text(angle = 90, 
##             margin = margin(0, 20, 0, 0)), strip.text = element_text(face = "italic", 
##             colour = "white"), strip.background = element_rect(fill = "#4C4C4C"))
## }